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CENTRAL LIMIT THEOREM AND INFLUENCE FUNCTION FOR THE 
MCD ESTIMATORS AT GENERAL MULTIVARIATE DISTRIBUTIONS 

By Eric A. Cator and Hendrik P. Lopuhaa 

Delft University of Technology 

We define the minimum covariance determinant functionals for 
multivariate location and scatter through trimming functions and 
establish their existence at any multivariate distribution. We provide 
a precise characterization including a separating ellipsoid property 
and prove that the functionals are continuous. Moreover we establish 
asymptotic normality for both the location and covariance estimator 
and derive the influence function. These results are obtained in a very 
general multivariate setting. 



1. Introduction. Consider the minimum covariance determinant (MCD) estimator in- 
troduced in [19], i.e., for a sample Xi, X2, . . . , Xn from a distribution P on Mf and < 7 < 1, 
consider subsamples S C {Xi, . . . , X„} that contain hn > [n7] points. Define a corresponding 
trimmed sample mean and sample covariance matrix by 

(1.1) ; ^'^^ 

Cn{S) = 7- 5^ (^i - Tn{S)){Xi - fn{S))'. 

Let Sn be a subsample that minimizes del{Cn{S)) over all subsamples of size hn > [^^7], 
where \x]^ denotes the smallest integer greater than or equal to x € M. Then the pair 
(Tn{Sn), Cn{Sn)) is an MCD estimator. Today, the MCD estimator is one of the most popular 
robust methods to estimate multivariate location and scatter parameters. These estimators, 
in particular the covariance estimator, also serve as robust plug-ins in other multivariate 
statistical techniques, such as principal component analysis [6, 21], multivariate linear regres- 
sion [1, 20], discriminant analysis [11], factor analysis [16], canonical correlations [22, 25], 
error-in- variables models [8], invariant co-ordinate selection [24], among others (see also [12] 
for a more extensive overview). For this reason, the distributional and the robustness prop- 
erties of the MCD estimators are essential for conducting inference and perform robust esti- 
mation in several statistical models. 

The MCD estimators are known to have the same breakdown point as the minimum volume 
ellipsoid estimators [19], and for a suitable choice of 7 they possess the maximal breakdown 
point possible for affine equivariant estimators (e.g., see [1, 15]). However, their asymptotic 
properties, such as the rate of convergence, limit distribution and influence function, are not 
fully understood. Within the framework of unimodal elliptically contoured densities, Butler, 
Davies and Jhun [3] show that the MCD location estimator converges at \/n-rate towards a 
normal distribution with mean equal to the MCD location functional. The rate of convergence 
and limit distribution of the covariance estimator still remains an open problem. Croux and 
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Haesbroeck [5] give the expression for the influence function IF(x; C, P) of the MCD covariance 
functional C{P) at distributions P with a unimodal elhptically contoured density and use 
this to compute limiting variances of the MCD covariance estimator. However, existence, 
continuity and differentiability of the MCD functionals at perturbed distributions is implicitly 
assumed, but not proven. Moreover, the computation of the limiting variances via the influence 
function relies on the von Mises expansion, i.e.. 



which has not been established. The distribution and robustness properties of robust mul- 
tivariate techniques that make use of the MCD, depend on the distribution and robustness 
properties of the MCD estimator, in particular those of the MCD covariance estimator. De- 
spite the incomplete asymptotic theory for the MCD, at several places in the literature one 
prematurely assumes either a ^/n rate of convergence or asymptotic normality of the MCD 
covariance estimator, or uses the influence function of the covariance MCD functional to inves- 
tigate the robustness of the specific multivariate method and to determine limiting variances 
based on the heuristic (1.2). 

This paper is meant to settle these open problems and extend the asymptotic theory for the 
MCD estimator in a very general setting that allows a wide range of multivariate distributions. 
We will define the MCD functional by means of trimming functions which are in a wide 
class of measurable functions. Our trimmed functionals have similarities with the trimmed 
fe- means considered in [10, 7, 9]. However, minimization over the k means in their variation 
functional is done separately from the class of trimming functions. This considerably facilitates 
compactness arguments that are used to establish existence and continuity for the functionals 
and moreover, in contrast with our MCD functionals, their approach yields functionals that are 
not affine equivariant. Nevertheless, these authors also recognized the advantage of employing 
a flexible class of trimming functions, which allows a uniform treatment at general probability 
measures, including empirical measures and perturbed measures needed for our purposes. 
We believe that obtaining our results for general multivariate distributions is an important 
contribution of this paper. To justify this claim, we will give several important examples of 
models where it is essential to study the MCD estimator for a class of distributions that is 
wider than the elhptically contoured distributions. 

We prove existence of the MCD functional for any multivariate distribution P and provide 
a separating ellipsoid property for the functional. Furthermore, we prove continuity of the 
functional, which also yields strong consistency of the MCD estimators. Finally, we derive 
an asymptotic expansion of the functional, from which we rigorously derive the influence 
function, and establish a central limit theorem for both MCD-estimators. We would like to 
emphasize that all results are obtained under very mild conditions on P and that essentially 
all conditions are satisfled for distributions with a density. For distributions with an elhptically 
contoured density that is unimodal we do not need any extra condition and recover the results 
in [3] and [5] as a special case (see [4]). 

The paper is organized as follows. In Section 2 we deflne the MCD functional for general 
underlying distributions, discuss some of its basic properties and provide examples of models 
where it is essential to study behavior of the MCD estimator for underlying distributions that 
are beyond elhptically contoured distributions. In Section 3 we prove existence of the MCD 
functional and establish a separating ellipsoid property. Section 4 deals with continuity of 
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the MCD functionals and consistency of the MCD estimators. Finally, in Section 5 we obtain 
an asymptotic expansion of the MCD estimators and MCD functional, from which we prove 
asymptotic normality and determine the influence function. In order to keep things readable, 
all proofs and technical lemmas have been postponed to an appendix at the end of the paper. 

2. Definition. Let P be a probability measure on M'^. To define an MCD functional at P 
we start by defining a trimmed mean and trimmed covariance functional in the following way. 
For a measurable function cf> -.M.^ ^ [0,1] define 

(2.1) 

Cpi<t>) = fix- Tp(</>))(x - rp(</.))V(x) Pidx). 



f(PdP 

The function (p determines the trimming of the mean and covariance matrix. For (p = 1, the 
above functionals are the ordinary mean and covariance matrix corresponding to P. When 
P = Pn, the empirical measure, and (p = Is for a subsample S, we recover (1.1). Next, we fix 
a proportion < 7 < 1 and require (p to have at least mass 7, i.e., 



>dP > 7. 

To ensure that the functionals in (2.1) are well defined, we take (p in the class 

Kpi'j) = |(/) : M'^ ^ [0, 1] : (p measurable, j <pdP>-f, j \\xf<p{x) P{dx) < ooj . 

If there exists cpp ^ Kp{-y) which minimizes det{Cp{(p)) over all (p G Kpi'j), then the corre- 
sponding pair 

{Tp{(Pp),Cp{^p)) 

is called an MCD functional at P. Note that, although for (p G Kpi'j) the functionals in (2.1) 
are well defined, the existence of a minimizing (p is not guaranteed. Furthermore, if a mini- 
mizing (p exists, it need not be unique. 

To complete our definitions, note that each trimming function cp determines an ellipsoid 
EiTpi(p),Cpi(p),rpi(p)), where for each /x G M^, S symmetric positive definite, and p > 0, 

(2.2) ^(/i,S,p) = {x£ M^' : ix - fj,yT.^\x - fi) < p^}, 
and 

(2.3) rp(</)) =inf{s>0 : P (£;(Tp (</>), Cp(0), s)) > 7} . 

If a minimizing trimming function (pp exists, then EiTpi(pp),Cpi(pp),rp{(pp)) is referred to 
as a "minimizing" ellipsoid. 

Note that the functionals in (2.1) are affine equivariant in the following sense. Fix a non- 
singular k X k matrix A and 5 G M'^ and let /i(x) = Ax + 5, for x G M*^. If X ~ P, then 
AX + b ~ Q = Poh~^. It is straightforward to see that cp G Kqij) if and only if <poh G Kpi'j), 
which yields 

TqicP) = ATpicP oh) + b and Cgi^) = ACpicp oh)A', 
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as well as rQ{<j)) = rp{(j)o h). Furthermore, (pq minimizes det(CQ(0)) over Kq{'^) if and only 
a (j)p = (j)Q o h minimizes det{Cp{4>)) over Kp{-y). This means that if an MCD functional 
exists, it is affine equivariant, i.e., Tq^cPq) = ATp{(j)p) + b and CQ{(j)Q) = ACp{4>p)A' . 

Butler et al. [3] define the MCD functional by minimizing over all indicator functions 1^ 
of measurable bounded Borel sets S C with P{B) = 7. These indicator functions form a 
subclass of Kp('y), that is sufficiently rich when one considers unimodal elliptically contoured 
densities. However, at perturbed distributions Pe^x = i^—£)P+£Sx, where 6x denotes the Dirac 
measure at x S M'^, their MCD functional may not exist. Croux and Haesbroeck [5] solve this 
problem by minimizing over all functions 1_b + with x ^ B and P{B) + 5P{{x}) = 7. 

These functions form a subclass of Kp{'y), that is sufficiently rich when one considers single- 
point perturbations of unimodal elliptically contoured densities, but the class Kp{'j) allows 
for functions other than Ap + <51{a;} for which the determinant of the covariance functional is 
strictly smaller. Moreover, minimization over the more flexible class Kp{^) allows a uniform 
treatment of the functionals in (2.1) at general probability measures, including measures with 
atoms. Important examples are the empirical measure Pn corresponding to a sample from P, 
in which case the functionals relate to the MCD estimators, and perturbed measures Pe,x, for 
which the functionals need to be investigated in order to determine the influence function. It 
should be noted that our Theorem 3.2 does show that a minimizer in the Croux-Haesbroeck 
sense does exist for all distributions P, but this is not at all obvious before hand. 

Definition (2.1) might suggest that minimization of det{Cp{(j))) is hindered by the fact 
that the denominator depends on (p. However, the following property shows that if a mini- 
mum exists, it can always be achieved with a denominator in (2.1) equal to 7. Its proof is 
straightforward from definition (2.1). 

Lemma 2.1 For any < A < 1 and (j) G Kp{'j), such that G Kp['^), we have 

Tp{X^) = Tp{<P), Cp(A0) = Cp(<^), and rp(A<^) = rp (</)). 

Since we can always construct a minimizing (j) in such a way that J (j)dP = 7, it is tempting 
to replace the term J (f)dP in (2.1) by 7. However, we will not do so, in order to keep enough 
flexibility for the functionals at probability measures P and trimming functions of the type 
cj) = for measurable B C M^' with P{B) > 7. An important example is the situation 
where P is the empirical measure. 

2.1. Examples of non- elliptical models where the MCD is relevant. We will prove (see 
Theorem 4.2) that the MCD estimators converge (under mild conditions) to the MCD func- 
tionals at P. These functionals might not be related in any way to the expectation of P or 
the covariance matrix (in fact, our conditions allow for P whose expectation does not even 
exist) and one might question the relevance of the MCD-functional for general P. 

First of all, we believe that it is not unreasonable to consider the MCD as a measure of 
location and scale on its own right, just like the median and the MAD. Our results then 
show how the natural estimator of this functional behaves. Especially in cases where the 
distribution has a heavy tail, the MCD functional might provide more useful quantitative 
information than the mean and covariance structure, for example for confidence sets of future 
realizations of P. In addition, we will give some explicit examples in which it is very relevant 
to extend the behavior of the MCD functional to general distributions. 

Testing for elliptically contoured density. We know that when P has a strictly unimodal 
elliptically contoured density, the MCD location functional equals the point of symmetry and 
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the MCD covariance functional equals a constant times the covariance matrix. If, for some 
data set, the MCD estimator turns out to be quite different from the sample mean and sample 
covariance matrix, then this would be an indication that the underlying density is not strictly 
unimodal and elliptically contoured (this is similar to the fact that the mean and the median 
might be different for non-symmetric univariate distributions). In fact, we could turn this idea 
into a test. To analyze the asymptotic power of such a test, it is clear that the asymptotic 
behavior of the MCD estimator for a P that is not strictly unimodal and elliptically contoured, 
is very relevant. Note however, that there exist distributions that are not strictly unimodal 
and elliptically contoured, but whose MCD functional does coincide with the mean and (a 
constant times) the covariance matrix, usually due to some strong symmetries. 

Invariant co-ordinate selection. The previous idea is in the same spirit as the invariant co- 
ordinate selection (ICS) procedure recently proposed in [24], where two covariance estimators 
are compared through so-called ICS roots to reveal departures from an elliptically contoured 
distribution. The authors suggest one of the covariance estimators to be a class III scatter 
matrix, of which the MCD estimator is an example. Determining whether ICS roots differ 
significantly, or what power such a test would have, remains an open problem. This is precisely 
where the distribution of the MCD estimator at elliptical and non-elliptical distributions is 
essential. 

Distributions with convex symmetric level sets.. Suppose our data Xi,...,Xn G M'^ IS a 
sample from a unimodal density /, symmetric around fi G M^', where we use the definition 
of Anderson in [2] (the level sets of / are convex and symmetric around fi). It follows from 
that paper that when we move the center of any ellipsoid towards // along a straight line, the 
mass of the ellipse increases. We can use this to show that the MCD location functional of / 
equals Therefore, the MCD location functional of the sample would be a robust estimator 
of the point of symmetry, and our results show how this estimator behaves. Note that the class 
of unimodal symmetric distributions is much bigger than the class of elliptically contoured 
densities. 

Independent component analysis. Consider a random vector Z G M'^ with a density / that has 
the property that for each coordinate, the mapping y i— )• f{zi, ...,?/,..., Zk) is a univariate, 
symmetric unimodal function of y for each fixed zi, . . . ,Zk, and that / is invariant under 
coordinate-permutations. For example, this would be the case if all the marginals of / are 
independent and identically distributed according to a univariate symmetric and unimodal 
distribution. It is clear that if the MCD functional for / is unique, then from the symmetries it 
follows that the location functional is zero, and the covariance functional is a constant times 
the identity matrix. If we observe an affinely transformed sample from /, i.e., Xi, . . . ,Xn 
where Xi = AZi + /i and Zi has density /, then the MCD estimator would be a robust 
estimator of fi and AA' . Note that the density of Xi, . . . , Xn is in general not elliptically 
contoured. The uniqueness of the MCD functional for an / of this kind would be similar to 
the results in [23] for S- and M-functionals. However, proving this is beyond the scope of this 
paper, and might in fact be quite hard, given the depths of the results in [23]. The above 
example has close connections with independent component analysis (ICA), a highly popular 
method within many applied areas, which routinely encounter multivariate data. For a good 
overview see [13]. The most common ICA model considers X arising as a convolution of k 
independent components, i.e., X = AZ, where A is non-singular, and the components of Z 
are independent. The main objective of ICA is to recover the mixing matrix A so that one 
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can 'unmix' X to obtain independent components. 

Contaminated distributions. An important property for any robust estimator for location 
and scatter is that it is able to recover to some extent the mean and covariance matrix of 
the underlying distribution when this distribution is contaminated. For instance, when the 
contamination has small total mass or is very far away from the center of the underlying distri- 
bution, it should not affect the corresponding functional too much. For our MCD functional, 
this is precisely the content of the following theorem, whose proof can be found in the ap- 
pendix. These results rely heavily on the methods used in this paper for general distributions, 
even if the uncontaminated distribution P is elliptically contoured. 

Theorem 2.1 Let P and Q be two probability measures on and define for x,r S R'^ the 
translation Tr{x) = x + r. Consider, for e < 1/2, the mixture 

Pr,s = {l-e)P + eQoT~\ 

Denote by MCD^(-) the MCD functional of level 7. Choose 7 such that e < 7 < 1 — e, and 
suppose that 

P(H) < 1^ 

for all hyperplanes H C M'^. 
(i) Then 

limMCD^(P^,e) = MCD^(P) and lim MCD^(P^,s) = MCD^/(i„^)(P), 

eiO ' ||r||-i-oo 

where the first limit should be interpreted as: every limit point is an MCD functional 
at P of level 7, and the second limit similarly, 
(a) Furthermore, if in addition Q has a bounded support, then for all 7 G (e, 1 — e), there 
exists ro > such that 

MCD^(P,,,) = MCD^/(i_,)(P), 

for all r G M'^ with \\r\\ > tq 

As an illustration of Theorem 2.1, consider an elliptically contoured distribution P with 
parameter The second limit in (i) shows that if the contamination is far from zero, 

the MCD functionals of the contaminated distribution are close to /i and a multiple of S. 
Part (ii) shows that for specific types of contamination, e.g., single point contaminations, the 
MCD functionals at the contaminated distribution recovers these values exactly. The proof 
of Theorem 2.1 in principle provides a constructive (but elaborate) way to find tq in terms of 
e,7, P and the support of Q. 

3. Existence and characterization of an MCD-functional. By definition, the ma- 
trix Cp{(j)) is symmetric non-negative definite. Without imposing any assumptions on P, one 
cannot expect Cp{(p) to be positive definite. We will assume that P satisfies: 

(3.1) P{H) < 7, for every hyperplane H C R'^. 

This is a reasonable assumption, since if P does not have this property, then there exists a 
(p G Kp{j) with det{Cp{(j))) = (for example, (p = 1h with P{H) > 7). This would prove 
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the existence of a minimizing but obviously the corresponding MCD-functional is not very 
useful. 

We first establish the existence of a minimizing (j) G Kp{'^). For later purposes we do not 
only prove existence at P, but also at probability measures Pt, for which the sequence {Pt) 
converges weakly to P, as t — )• cxd. For ease of notation we continue to write Pq instead of P 
and for t > write 

(3.2) Tt = Tp,, Ct = Cp,, rt = rp,, and Kt{i) = Kp^l). 

The next proposition shows that eventually the smallest eigenvalue of the covariance func- 
tional is bounded away from zero uniformly in (j) and t. 

Proposition 3.1 Suppose Pq satisfies (3.1) and let Pt — )• Pq weakly. Then there exists Aq > 
and to > 1 such that for t = all t > to, all (p G Kt{'^), and all a ^ (the sphere in M.^), 
we have 

j{a'{x-TtmfPt{dx)>\o. 

In particular this means that the smallest eigenvalue of Ct{(f>) is at least Aq. 

An immediate corollary is that if det(Cf (</))) is uniformly bounded, there exists a compact set 
that contains the location and covariance functionals for sufficiently large t (see Lemma 6.2 
in the appendix). This will become very useful in establishing continuity of the functionals 
in Section 4. For the moment, we use this result to show that for minimizing det{Ct{4>)) , one 
may restrict to functions 4> with bounded support. 

For i? > 0, define the bah Br = {x £ R'' : \\x\\ < R} and for t > define the class 

irf(7) = {</'Gi^t(7):{0/O}cP^}. 

Clearly, K^{-f) C Kt{'y). The next proposition shows that for any (j) G Kt{'y) we can always 
find a t/j with bounded support in K^{'y) that has a smaller determinant. 

Proposition 3.2 Suppose that Pq satisfies (3.1) and Pt — Pq weakly. There exists R > 
and to > 1 such that for t = 0, all t > tQ and all (/) G Kt{'y), there exists ip G K^i^j) with 

det(C7t(V)) < det (a ((/))). 

Proposition 3.2 illustrates the general heuristic that if has P^-mass far away from Tt{(j)), 
then moving this mass closer towards Tt{(j)) will decrease the determinant of the covariance 
matrix. Together with Proposition 3.1 this establishes the existence of at least one MCD 
functional for the probability measure Pq. Moreover, if Pt — )■ Pq weakly, then at least one 
MCD functional exists for Pt for sufficiently large t. 

Theorem 3.1 Suppose Pq satisfies (3.1) and let Pt — )• Pq weakly. Then there exists R> and 
to > 1; such that for t = and t > to, there exists cj)t £ K^{'y), which minimizes det(C4(i^)) 
over Kt{'y). 

In the remainder of this section, we provide a characterization of a minimizing (j), which 
includes a separating ellipsoid property for the MCD functional. A similar result has been 
obtained in [3] for the empirical measure and in [5] for single-point perturbations of distri- 
butions with a unimodal elliptically contoured density. We will denote the interior of a set E 
by E°, and the (topological) boundary by dE. 
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Theorem 3.2 Let (j) E Kp{j) be such that {Tp{(p),Cp{(j))) is an M CD functional at P and 
let Ep{(j)) = E{Tp{(j)),Cp{(j)),rp{(j))) be the corresponding minimizing ellipsoid. Then 



Furthermore, either <j) = on dEp{(f)) (P-a.e.), or (p = 1 on dEp{(j)) (P-a.e.), or there exists 
X G dEp{(l)) such that P{dEp{(j))) = P{{x}). 

The theorem shows that a minimizmg trimming function (j) is almost the indicator function 
of an eUipsoid with center Tp{^) and covariance structure Cp{(p). When P has no mass on 
the boundary of the ehipsoid, then (j) is equal to the indicator function of this ellipsoid. If 
the interior of the ellipsoid E{Tp[(j)),Cp{(t)),rp{(j)))) has mass strictly smaller than 7, then 
either (j) equals 1 on the entire boundary of the ellipsoid, in which case the (closed) ellipsoid 
has P mass exactly 7, or P only has mass in exactly one point on the boundary, and </> adapts 
its value in that point such that it has total P mass 7. 

Theorem 3.2 holds for any probability measure, in particular for the empirical measure P^ 
and for perturbed measures Pe,x-, in which case we obtain results analogous to Theorem 2 in [3] 
and Proposition 1 in [5], respectively. However, note that the characterization in Theorem 3.2 
is more precise and is such that the center and covariance structure of the separating ellipsoid 
are exactly the MCD functionals themselves. 

4. Continuity of the MCD functional. Consider a sequence [Tt{(/)t), Ct{4>t)) of MCD 
functionals corresponding to a sequence of probability measures Pt — ?• Pq weakly. We inves- 
tigate under what conditions (Tt{(l)t),Ct{4't)) converges and whether each limit point will be 
an MCD functional corresponding to Pq. Our approach requires J 4>dPt — )■ J (pdPo uniformly 
in minimizing (p. The following condition on Pq suffices: 



where £ denotes the class of all ellipsoids. This may seem restrictive, but it is either automat- 
ically fulfilled for sequences that are important for our purposes or a mild condition on Pq 
suffices. For instance, when Pt is a sequence of empirical measures, then (4.1) holds auto- 
matically by standard results from empirical process theory (e.g., see Theorem 11.14 in [17]) 
because the ellipsoids form a class with polynomial discrimination or a Vapnik-Cervonenkis 
class. Condition (4.1) also holds for sequences of perturbed measures Pe,x, as e | 0. In general, 
if PQ{dC) = for all measurable convex C C M'^, then condition (4.1) holds for any sequence 
Pt —7- Pq weakly (see Theorem 4.2 in [18]). Note that this is always trivially true if Pq has a 
density. 

For later purposes we prove continuity not only for MCD functional minimizing func- 
tions (j)t, but for any sequence of functions ^pt with uniformly bounded support that satisfy 
the same characteristics as (pt and for which det{Ct{ipt)) is close to det{Ct{(j)t))- 

Theorem 4.1 Suppose Pq satisfies (3.1). Let Pt — t- Pq weakly and suppose that (4.1) holds. 

For t > 1, let ipt G Ktij) such that ipt < ^Et, where Et = E{Tt{iJt),Ct{'4't),rt{ilJt)), and 
suppose there exist R > such that {tpt 7^ 0} C Bp, for t sufficiently large. Suppose that 




(4.1) 




det{Ct{^pt)) - detiCticpt)) ^0, as t ^ 00 



where (pt minimizes det(Cj ((/>)) over Kt{'y). Then 



CLT AND INFLUENCE FUNCTION FOR THE MCD 9 

(i) there exist a convergent subsequence {Tt^{il}t^),Ct^{il)t^)) ; 
(a) the limit point of any convergent subsequence is an MCD functional at Pq. 

An immediate corollary is that in case the MCD functional at Pq is uniquely defined, all 
possible MCD functionals at Pt are consistent. For later purposes, we also need that rt{(j)t) 
converges. This may not be the case if Pq has no mass directly outside the boundary of its 
minimizing ellipsoid. For this reason, we also require that 

(4.2) Po{E{TQ{(t)o), Co((/>o), ro((/.o) + e) > 7, for ah e > 0. 

Note that this condition is trivially true if Pq has a positive density in a neighborhood of the 
boundary of E{TQ{(t)o),CQ{(t)Q),rQ{(t)o)). 

Corollary 4.1 Suppose Pq satisfies (3.1) and that the MCD functional (To((/>o), Co((/'o)) is 
uniquely defined at Pq. Let Pt Pq weakly and suppose that (4.1) holds. For t > 1, let 
ipt G Kt{l) such that ipt < Iej, where Et = E{Tt{ipt),Ct{tpt),ft{ijjt)), and suppose there exist 
R > such that {ipt 7^ 0} C Bji, for t sufficiently large. Suppose that 

det{Ct{^Pt)) - det{Ct{(l)t)) ^ 0, 
where <j)t minimizes det(Cj((^)) over Kt{'y). Then, 

(i) {Tt{iJt),Ctm) ^ (ro(,^o),Co(0o)). 
If in addition Pq satisfies (4.2), then 
(a) rt{il)t) ro((Ao)- 

Uniqueness of the MCD functional has been proven in [3] for distributions Pq that have a 
unimodal elliptically contoured density. For general distributions, one cannot expect such a 
general result. For instance, for certain bimodal distributions or for a spherically symmetric 
uniform distribution which is positive on a large enough disc, the MCD functional is no longer 
unique. 

4.1. Consistency of the MCD estimators. For n = 1,2,..., let Pn denote the empirical 
measure corresponding to a sample from Pq. From definitions (1.1) and (2.1) it is easy to see 
that the MCD estimators can be written in terms of the MCD functional as follows 

f„(5„) = r„(i5j, 

Cn{Sn) = Cni^Sn)^ 

where we use the notation introduced in (3.2). Moreover, define rn{Sn) = rn{^s„)- We should 
emphasize that Tn{Sn) and Cn{Sn) may differ from the actual MCD functionals Tn{<j)n) and 
Cn{4'n)- Obviously, if these differences tend to zero, then consistency of the MCD estimators 
would follow immediately from Theorem 4.1, but unfortunately we have not been able to find 
an easy argument for this. However, we can show that the determinants of the covariance 
matrices are close with probability one, which suffices for our purposes. 

Proposition 4.1 Suppose Pq satisfies (3.1). Then for each MCD estimator minimizing sub- 
sample Sn and each MCD functional minimizing function (pn, we have 

detidniSn)) - det(C„((A„)) = 0(n"i)) 

with probability one. 



(4.3) 
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This does not necessarily mean that Tn{Sn) — Tn{(j)n) and Cn{Sn) — Cn{4'n) are also of the order 
0(n~^). But in view of Corollary 4.1, it suffices to establish a separating ellipsoid property 
and uniform bounded support for the minimizing subsample. The latter result can be found 
in the appendix, whereas the separating ellipsoid property is stated in the next proposition. 

Proposition 4.2 Let Sn be a minimizing subsample for the MCD estimator and define 
corresponding ellipsoid En = E{Tn{Sn), Cn{Sn),rn{Sn)) ■ Then Sn has exactly [727] points, 
Sn C En and En only contains points of Sn ■ 

This separating ellipsoid property is somewhat different from the one in Theorem 3.2 (for the 
empirical measure) and from the one in [3]. The ellipsoid En has the MCD estimators as center 
and covariance structure instead of the trimmed sample mean and covariance corresponding 
to the minimizing subsample excluding a point that is most outlying (see [3]). The advantage 
of the characterization given in Proposition 4.2 is that integrating over Sn or En with respect 
to Pn is the same, which will become very useful later on. We now have the following theorem. 

Theorem 4.2 Suppose Pq satisfies (3.1) and that the MCD functional [Tq{cI)q) , Co{(j)o)^ is 
uniquely defined at Pq. For n > 1, let Sn be a minimizing subsample for the MCD estimator. 
Then 

(i) {Tn{Sn),Cn{Sn)) (Tq ((/^o) , Cq ((^o)) , with probability one. 

If, in addition Pq satisfies (4.2), then 

(a) rn{Sn) — rQ{(j)o), with probability one. 

As a special case, where Pq has a unimodal elliptically contoured density, we recover Theo- 
rem 3 in [3]. With Theorems 4.1 and 4.2 it turns out that the difference between the MCD 
estimator (Tn{Sn),Cn{Sn)) and the MCD functional {Tn{4>n) , Cn{ipn)) indeed tends to zero 
with probability one. However, we were not able to find an easier, direct argument. 

5. Asymptotic normality and influence function. For n = 1,2,..., let Sn be a 

minimizing subsample for the MCD estimator and for ease of notation, write 

A'ri = TniSn), 5]„ = Cn{Sn) = T^, p„ = r„(5„), and En = E[^ni^m Pn)^ 

and define 0„ = (P^m r„, />„) in M'^ x PDS(/c) x M, where PDS(A;) denotes the class of all 
positive definite symmetric matrices of order k. Note that r„ is uniquely defined in PDS(/c). 
Similarly, let P„ denote the empirical measure corresponding to a sample from Pq, and for 
n = 0, 1, 2, . . ., let (f)n be a minimizing trimming function for the MCD functional and write 

(5.1) fln = Tn{(pn), 5]„ = C„((^„) = T^, Pn = rn{(t)n), a.nd En = E{fln,^n, Pn), 

where T„, C„ and r„ are defined in (3.2), and write 9n = {pn,^n, Pn)- According to Corol- 
lary 4.1 and Theorem 4.2, under very mild conditions on Pq, we have 9n — ?• ^0 and 6n — Oq 
with probability one, where 60 = {fio,To,po) corresponds to Pq as defined in (5.1). The limit 
distribution of On — Oq and On — Oq are equal and can be obtained by the same argument. We 
briefly sketch the main steps for the MCD estimator. 
Consider the estimator matrix equation in (1.1), 

= -p-ycTT / {X -Pn){x -PnY Pn{dx). 
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After multiplying from the left and the right by r~^, rearranging terms and replacing Sn by En 
(which leaves the integral unchanged according to Proposition 4.2), we obtain a covariance 
valued M-estimator type score equation: 







Er 



(f^^{x - Jln){x - /In)'f„^ - Ik^ Pn{dx). 



Similarly, one can obtain a vector valued M-estimator type score equation from the location 
equation in (1.1) and the equality P„(£'„) = [727] /n = 7 + 0(n~^) can be put into a real 
valued score equation. Putting everything together, we conclude that On satisfies 

(5.2) = J ^{y,en)Pn{dy) + 0{n-^), 
where ^ = {^1,^2, ^3), defined as 

^l(y, 6) = ^{\\G-^{y~m)\\<r}G~^iy " 

(5.3) ^2{y, e) = ^{\\G-Hy~.n)\\<r}{G-\y - m){y - m)'G-^ - h) 

where 6 = {m,G,r), with y,i G M.^, r > 0, and G € PDS(A;). Rewrite equation (5.2) as 
= A(^„)+ / ^{y,eo)d{Pn-Po){dy) 

(5.4) 

+ J (^{yX)-'^{y,Oo)) {Pn-Po){dy) + Oin~^), 

where 

(5.5) A{e)= [ ^iy,e)Po{dy). 



In order to determine the limiting distribution of we proceed as follows. The first term 
on the right hand side of (5.4) can be approximated by a first order Taylor expansion that is 
linear in 9n — Oq and the second term can be treated by the central limit theorem. Most of the 
difficulty is contained in the third term, which must be shown to be of the order op(n~^/^). 
We apply empirical process theory, for which we need / (^(y, On) — *(?/, 6*0)) Po{dy) — )■ 0. For 
this, it suffices to impose 

(5.6) PoidEo) = 0. 

For the MCD functional On the argument is the same, apart from the fact that replacing 
by \e„ requires an additional condition on Pq, i.e., 

(5.7) Pq has no atoms. 

Note that (5.6) and (5.7) are trivially true if Pq has a density. By representing elements of 
X PDS(/c) X M as vectors, we then have the following central limit theorem for the MCD 
estimators and the MCD functional at P„. 
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Theorem 5.1 Let Pq satisfy (3.1), (4.2) and (5.6). Suppose that (/io,5]o) is uniquely defined 
at Pq. If a, as defined in (5.5), has a non-singular derivative at 9q, then 

9n-eo = -A'(0o)"'- V (^(X„ ^o) - E^'(X,, Bo)) + op(n-V2), 
n ^-^ 

where ^ is defined in (5.3). If in addition Pq satisfies (5.7), then 

n 

9n-eo = -A'ieoy'-V] {^{X,, Oo) - E*(Xi, ^o)) + Mn~^^^). 

i=l 

In particular, this means that y/n{9n — Oq) and y/n{9n — 9o) are asymptotically normal with 
mean zero and covariance matrix 

A'(^o)"'A^A'(^o)~' 
where M is the covariance matrix of ^{Xi,9o). 

Now Theorem 5.1 has been estabhshed, it turns out that the MCD estimator and MCD 
functional (at P„) are asymptotically equivalent, i.e., 9n — 9n = op(n~^/2). Although this 
seems natural, we have not been able to find an easier, direct argument for this, in which 
case we could have avoided establishing parallel results, such as the ones in Section 4.1. 
An immediate consequence of Theorem 5.1 is asymptotic normality of the MCD location 
estimator y/nifin — Mo)- Furthermore, since 

- So = (r, + ro)(f „ - To) = 2ro(r„ - Tq) + op(i). 

Theorem 5.1 also yields asymptotic normality of the MCD covariance estimator ^/niTin — So) 
and of y/n{j)n — po)- In [4] a precise expression is obtained for A'(^o) for Pq with a density / 
and non-singularity of A'{9o) is proven if / has enough symmetry. This includes distributions 
with an elliptically contoured density, so that as a special case of Theorem 5.1, when Pq has 
a unimodal elliptically contoured density, one may recover Theorem 4 in [3] for the location 
MCD estimator. 

To determine the infiuence function, let (pe,x be the minimizing (/>-function for P/r^x and let 

l^e,x = Tp^,A4>e,x), ^e,x = Cp^^^ {(pe,x) = T^^x; Pe,x = ?^P,,, (<^e,x) , 

and E^^x = E{fi£^x,'^e,x, Pe,x) be an MCD functional at Ps^x with corresponding minimizing 
ellipsoid. To determine the influence function, we follow the same kind of argument to obtain 
M-type score equations, by rewriting equations (2.1) at P^^x and replacing cp^^x by Note 
however, that from the characterization given in Theorem 3.2, 

(PQ{dEs,x) if <l)e,x = on dEs^x, 
/ i^E,,^ - <pe,x) dPo = < if (t)e,x = 1 on dE^^x, 

[^Po({-z}) otherwise, for some z G dE^^x- 

This means that in order to replace integrals over (f)^^x by integrals over E^^x: we need a 
stronger condition on Pq, i.e., 

(5.8) Po{dE) = for any ellipsoid E, 
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Now, denote ^^^^ = {fJ'e,x,^e,x, Pe,x) then similar to (5.2), we obtain 
(5.9) = il-e)Aiee,x) + e'^e{x), 

where A is defined in (5.5) and = i^i,eT^2,ej^3,e)^ with 

^l,eix) = (/)e,a;(x)rQ ^(x - /io) 
^^A^) = 0e,x(2;) - 7. 

Define e(P) = (/i(P), r(P), p(P)) , where ^l{P) = Tp{^p), r(P)2 = Cp(0p), p{P) = rp{^p), 
and (pp denotes a minimizing trimming function. The influence function of ©(P) at Pq is 
defined as 

IF(.,e,Po)^lim^<"-^'P° + ''^-'-Q™. 

if this limit exists, where 6x is the Dirac measure at x G M'^. The following theorem shows 
that this limit exists and provides its expression. 

Theorem 5.2 Suppose Pq satisfies (3.1), (4.2), and (5.8). Suppose that (/xo,So) is uniquely 
defined at Pq. Suppose that x ^ dE{fiQ,TiQ, po). If A has a non-singular derivative at 9q, then 
the influence function of G at Pq is given by 

IF(x,e,Po) = -A'(^o)-'*(x,0o), 

where ^ is defined in (5.3). 

From definition (5.3), we see that lF{x,@, Pq) is bounded uniformly for x ^ (?£'(/io, Sq, po)- 
When X £ dE{fiQ,'EQ, po), then it is not clear what happens with (j)e,x{x), as e | 0. However, 
recall that there exist R > such that {(j)e,x 7^ 0} C Bpi, for e > sufficiently small. This still 
implies that if (p£^x{x) has a limit, as e \,0, then IF(x; 0, Pq) exists and is bounded. In the case 
that (pe^xix) does not have a limit, as e 4, 0, then we can still conclude that O^^x — Go = 0(e), 
uniformly for x G dE{pQ, So,po)- 

Because Ti^^x — '^0 = 2ro(r£^a; — Fq) + o(l), as e J, 0, it follows that the influence function 
of the covariance functional S(P) = Cp{(f)p) is given by 

IF(x;S,Po) = 2ro-IF(rE;r,Po). 

As a special case of Theorem 5.2, when Pq has an elliptically contoured density, Theorem 1 
in [5] may be recovered (see [4]). Finally, note that together with Theorem 5.1 it turns out 
that the von Mises expansion indeed holds, i.e., 

1 

^n-eo = - VlF(X,;G,Po) + op(n-^/2), 
n ^-^ 

which includes the heuristic (1.2). 

On the basis of Theorems 5.1 and 5.2 one could compute asymptotic variances and ro- 
bustness performance measures for the MCD estimators and compare them with other robust 
competitors. Assuming the influence function to exist and the expansion (1.2) to be valid. 
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Croux and Haesbroeck [5] provide an extensive account of asymptotic and finite sample rel- 
ative efficiencies for the compomnents of the MCD covariance estimator separately at the 
multivariate standard normal, a contaminated multivariate normal and at several multivari- 
ate Student distributions, for a variety of dimensions k = 2,3, 5, 10, 30 and 7 = 0.5, 0.75, as 
well as a comparison with S'-estimators and reweighted versions. Of particular interest would 
be a comparison with the Stahel-Donoho (SD) estimator. Its asymptotic properties have been 
established by Zuo et al. [26, 27, 28], who also report an asymptotic and finite sample effi- 
ciency index for the SD location estimator and for the full SD covariance estimator at the 
multivariate normal and contaminated normal as well as a gross error sensitivity index and 
maximum bias curve for the SD covariance estimator. The first impression is that overall, 
apart from computational issues, the SD estimator performs better than the MCD. However, 
a honest comparison would require comparison of the same measure of efficiency and of the 
maximum bias curves. To determine the latter seems far from trivial for the MCD and we 
delay such a comparison to future research. 

6. Appendix. Because the proof of Theorem 2.1 relies heavily on the proof of the results 
in Sections 3 and 4, this proof is postponed to Subsection 6.2. 

6.1. Proofs of existence and characterization (Section 3). For a G M'^, ||a|| = 1, /i G 
and ?'2 > ri > 0, define the cylinder 

(6.1) H{a,ix, [ri,r2]) = {x G M^' : rl < {a'ix - < rj'j , 

and write H{a,^,r) for //(a, /x, [0, r]). The proof of Proposition 3.1 relies on the following 
lemma. 

Lemma 6.1 Suppose Pq satisfies (3.1) and let Pt — )■ Pq weakly. Then there exists e > and 
to > 1 such that for t = and all t > tQ, all a G M'^ with \\a\\ = 1, all ^ G M'"' and all r > 
with Pt {H{a, fi, r)) > 7 — £, we have 



I {a\x- fi))^ Pt{dx)>s. 

J H(a,u,r) 



Proof: We start by showing that there exists 5 > and to > 1 such that for all hyperplanes H 
and t > to, we have Pt{H) < 7 — (5. For suppose there exists a sequence tm — >• 00 and 
hyperplanes Hm with 

PUHn,)>J--. 

m 

Choose ?7 > small. There exists a compact set K C M^, such that for all t > 0, Pt{K) > l — rj. 
This means that for m large enough, if Pt^ (H) > 7 — 1/m, we must have HOK ^ 0. So we can 
choose fijn G K and G (the sphere in M'^), with Hj^ = H{am, /^mi 0), as defined in (6.1). 
Now, by passing to a subsequence if necessary, we may assume that oo and — ^ /Uq. 

Let Hq = H{aQ, ^0,0), as defined by (6.1). Choose a small ball B around the origin. Since K 
is compact, the functions x 1— )• a'^{x — /x^) are uniformly equicontinuous on K, which proves 
that there exists niQ > 1, such that Hm Ci K C Hq + B, for all m > mo. Furthermore, there 
exists a continuous bounded function cp such that Iho+b < < ^Ho+2B- We can increase mo 
such that for all m > mo, 

J dPt,,^ < JcPdPo+r] and Pt„ (Hm) > 7 - V- 
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We finally conclude that for m > tuq, 

PoiHo + 2B)> l<pdPo> I ct>dPt^ - ri 

> Pt^ {Ho + B)-r,> Pt„^ {Hm. r\K)-^>Pt^ (H^) - 2r/ > 7 - 3r?. 

Since B and rj are arbitrary, this would show that Pq(Hq) > 7, which contradicts (3.1). 
So now we can choose 6 > and tQ > 1, such that 

(6.3) Pt{H) <-f-6 and Po{H) < 7 - <5, 

for all hyperplanes H and t > to- Next, suppose that there exists a sequence tm 00, 
flm G S^, S M*^, > 0, and cylinders = H{am, fJ-m, fm), as defined by (6.1), such that 

(6.4) p,^(C„)>7-^ and / {a[,{x - ^im)f P^dx) < - . 

Choose a compact set K such that Pt{K) > 1 — (7 — 5/3)/2, for all t > 0. Since 

PUC,nnK)>{^-6/3)/2, 

we can always choose fim relatively close to K, since otherwise the integral in (6.4) becomes 
unbounded. So we can restrict Hm to a compact set and assume that am ao and fim — ^ Mo- 
Furthermore, we can bound the r^, since the condition Pt^{Cm) > 7 — 5/3 can be satisfied 
by bounded r^- This means that we can also assume that — )• tq. Let Cq = H{ao, fio,ro), 
as defined by (6.1). An argument similar to (6.2) shows that if tq = 0, we would get that 
PQ{H{ao, fio,0)) > 7 — S/2, which is a contradiction, so ro > 0. There exists niQ such that for 
all m > rriQ, 



'''' J H(am,Um,\rm./2,rm]) 



^ J H{am,t^,n,[rm/2,rm]) 

2 2 

> ^Pt,„(i/(a^,M„,[r„/2,r„])) > ^P^^ (i/(a„, /i^, [r^/2, r„])). 

Together with (6.4), this implies that by increasing mo, we have for all m > mo, 

Pt^{H{am,fim,rm/2))>j-6/2. 
By means of an argument similar to (6.2), this shows that 
(6.5) Po(^(«o,/Uo,ro/2)) > 7-5/2- 

Now, choose r/ > 0. Then there exists a continuous bounded function (j), a compact set K' 
with Pt{K') > 1 — 7] for all t > 1, and mo > 1, such that 

(t><ro, > (ao(x - /io))^lH(ao,Mo,n)/2) - ^1 and cf) ■ Ik' < (.a'^ix - Hrn)ftc^, 
for all m > mo. Increase mo, such that with (6.4), for all m > mo 

(l)dPt^> (pdPo-T] and / {a'mix - fj,m)f dPt^ < r] . 

J J J Cm 
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It follows that for all m > mo, 

[ [a'oix - fio)Y Poidx) < [cl)dPo + r]< [ cj^dPt,^ + 2r? 

J H{ao,fio,ro/2) J J 

< j \K'4>dPt^ + {2 + rl)7] 

< I {a'^{x - ^m)f dPt^ + (2 + rl)rj < (3 + r^rj. 
Since > was arbitrary, this proves that 

I {a'oix - iso))^ Poidx) = 0. 

J H{ao,lJ.o,ro/'2) 

Together with (6.5) this would show that Po{H{ao, fj,o,0)) > 7 — 5/2, which is in contradiction 
with (6.3). ■ 

Proof of Proposition 3.1: Suppose a E M''" with ||a|| = 1 and (f) E Kt{j). Write fit = Tt{(j)) 
and define 

st = inf{s>0 : Pt{H{a, m, s)) > -f} , 

with H{a, fit, s) as defined in (6.1). Similarly, let Ht = H{a, fit, st) and choose < r < 1 such 
that Pt{H°) + TPt{dHt) = 7. Since J(l)dPt> 7, we have 



/ (l)dPt> [ {l-<l))dPt + TPt{dHt). 
Jr'=\h° Jh° 



This implies that 



{a{x-fit)f<l)dPt> I {a'{x-fit)Y{l-(t>)dPt + T I [a' {x - fit))" dPf 



dHt 



Therefore, with ht = J (j)dPt < 1, we find 

a'Ct{(l))a = 1 / {a'{x - fit)f <t){x) Pt{dx) 



ht 
1 



[a'{x - f,t)Y Pt{dx) + - / (a'(x - fit)f Pt{dx). 



ht Jh° ht jQHt 

Choose to > 1 and e > according to Lemma 6.1 and consider t > to- If Pt{H^) > 7 — e/2, 
there exists < < sj, such that Pt{H{a, fit,ut)) > 7 — e. According to Lemma 6.1, this 
means 

aCti<P)a > 1 / {a'{x - fit)f Pt{dx) > 1 / {a'{x - f,t)f Pt{dx) > ^ > e. 

Jh° IT-t JH{a,^it,ut) 

If Pt{H^) < 7 — e/2, then r > e/2. Because Pt{Ht) > 7 — e, again according to Lemma 6.1, 
we find (we can always choose e < 2) 

a'Ct{(t>)a >^ I {a'{x - fit)f Pt{dx) + ^ [ {a'{x - fxt)f Pt{dx) 

^"-t Jh° ^"t J dHt 
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This finishes the proof. ■ 

The proof of Proposition 3.2 reUes on two lemmas. The first one is a direct consequence of 
Proposition 3.1, and shows that if det(Ct(</>)) is bounded uniformly in t and (p, then there exists 
a fixed compact set that contains all {Tt {(f)), Ct {(!))) eventually. The second lemma is a useful 
property involving the determinants of two non-negative symmetric matrices. Furthermore, 
for i? > and /x G M'^, define 

(6.6) B{fM, R) = ^xGR'' : \\x - /i|| < iij , 

and write Bji in case fi = 0. 

Lemma 6.2 Suppose Pq satisfies (3.1) and let Pt — t- Pq weakly. Fix M > 0. Then there exist 
^ 1) < Aq < Ai < oo and L,p > 0, such that for t = 0, all t > to, all cj) such that 
Tt{(j)),Ct{(p) exist, J (pdPt > 7, and 

det{Cticj))) < M, 

we have that all eigenvalues of Ct{(f)) are between Aq and Ai, ||Tt(0)|| < L, and rt{(p) < p. 

Proof: The existence of Aq follows directly from Proposition 3.1. This also implies that the 
largest eigenvalue Amax of Ct{4>) is smaller than M/Aq~^. Finally, choose R > such that for 
all t > 0, Pt{Bji) > 1 — 7/2, with Bf{ as defined in (6.6). Suppose ||Tt((/))|| > R and according 
to Lemma 2.1, assume without loss of generality that J (pdPt = 7. Then, since 

7 
2' 

we find 

1 f fTt{cl>y{TticP)-x)\\ 

<p{x)Pt[dx) 



[ <t)dPt= I 


^(l>dPt- J 


f cl)dPt>^-{l-Pt{BR))> 









iJ V \\Tt[ 
If ( TM)'{Tt{<P)-x) \\^ ^^^^^ 

> - / mm\-Rfmm^)>\mm\-Rf- 
1 Jbr 2 

This proves that there exists L > 0, depending on R, M and Aq, such that ||r((i;^)|| < L. 
Finally, since 

(x - ucp))'cm~\x - u^)) < M+I^^ 

Ao 

for p > large enough, the ellipsoid E{Tt{4)),Ct{(j)), p), as defined in (2.2), contains the ball 
B{0, py/Xo — L). Choose p > large enough, such that 



Pt(S(0,pVAo-i)) >7, 
for all t > 1. Then Pt{E{Tt{(j)),Ct{4'), p)) > 7 and by definition we must have rt{(j)) < p. 
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Lemma 6.3 Let Si and S2 be two symmetric matrices, non-negative and positive definite, 
respectively, such that Tr(E2"^(Si — S2)) < 0. Then det(Si) < det(S2). A similar result holds 
with < instead of strict inequalities. 

Proof: Without loss of generality we may assume that S2 = Ik- Suppose Tr(Si) < Tr(/fc). 
This means that the eigenvalues Ai, . . . , of Si satisfy (Ai + • • • + A^) /k < 1. By means of the 
inequality between the arithmetic mean and the geometric mean of non- negative numbers, 
we find det(Si) = Ai • • • Afc < 1. ■ 

We also need the following well known result: 

Lemma 6.4 Suppose Q is a probability measure on M.^ such that f Q{dx) < +00 and Q 
is not supported by a hyperplane. Define fJ- = f xQ{dx). Then for all a € M'^, a ^ fi, we have 

de.(/(.-,),.-,)'W.))<de.(/,.-.„.-<.)'g(..: 

Proof: First note that 

(. -„)(.- aYQ(a.) = /(X Qi,.) + (o - - ,)', 

then apply Lemma 6.3, remembering that J {x — a){x — a)' Q{dx) is invertible and therefore 
strictly positive definite. ■ 



Proof of Proposition 3.2: Choose to > 1 and Aq > according to Proposition 3.1. Choose 
R' > such that Pt{B^') > 7, for all t > 0. Let ipQ be a continuous bounded function such 
that Ifi^, < V'o < ^Br'+i define Dq = 2 det(Co(V'o))- Because Pt — >• Pq weakly, and ipo 
has bounded support, we have 



i^odPt^ / ipodPo, J iPq{x)x Pt{dx) ^ J ■ilJo{x)xPo{dx), 

and / il)o{x)xx' Pt{dx) ^ / 'il;(){x)xx' Po{dx) 



and hence Ct{ipo) Co(V'o)) so that for t large enough, det(Ct(V'o)) < ^o- 

Now, consider (p E Ktij). If det(Cj((/))) > Dq > detlCtiipo)), then we are done, because 
V'o e Kf^\j). Therefore, suppose that det{Ct{(j))) < Dq. According to Lemma 6.2, this 
implies there exist Ai > Aq > and L > such that 

(6.7) Ao < Xnuniam < A^ax(a(0)) < Ai and \\Ttm < L, 

uniformly in t and (j). According to Lemma 2.1, we may assume that J (pdPt = 7. Choose any 
R > R' + 1 and suppose that <^ > 1^^ • Because J (pdPt = 7 and Pt{Bji/) > 7, we know 
that 



{l-(t>)dPt> / (t>dPt. 
Jr''\Br 

Define hi = llRfe\B^ • (/^ and /i2 = rl b^, (1 — 0), where we choose < r < 1 such that 
(6.8) / h2 dPt = [ r(l - 0) dPt = [ hdPt. 



CLT AND INFLUENCE FUNCTION FOR THE MCD 19 

Furthermore, define ip = (p — hi + h2 and note that tp G K[^{-f). Because according to (6.8), 
J ipdPt = J (pdPt, we can write 

det(Ct(^)) = det ( /(^ - Ttim^ - Ttm'^ix) Pt{dx) 

(6.9) < det ( Ttim^ - Ttm'^ix) Pt{dx) 

= det (ct{P) + l\x- Tt{P)){x - Ttm'{^{x) - <P{x)) Pt{dx] 



V / <pdPt 

For the inequahty we used Lemma 6.4. So according to Lemma 6.3, it suffices to show that 
(6-10) TTTTTT li^- Ttm'Ct{<P)-\x - Tt{mh2{x) - h{x)) Pt{dx) < 0. 



fpdPtj 

To see that this is true, note that with (6.7) we get 

(x - rt(0))'Ct(0)-i(x - Tt{cl)))h2{x) Pt{dx) < Ai / ||x - Tt{(P)fh2{x) Pt{dx) 



and 



<\i{E! + Lf j h2dPt, 



{x-Ttm'CM)-\x-Tt{P))hi{x)Pt{dx) > Ao / \\x-Tt{cb)fhi{x)Pt{dx) 

> \o{R-Lf j hi dPt. 
So, together with (6.8), for R large enough (but independent of (p\), this proves (6.10). ■ 

Proof of Theorem 3.1: Choose R > and tQ > 1 according to Proposition 3.2. Then for 
t = and t > to, we may restrict minimization to G Since K^{^) is a weak*- 

compact subset of L°°(Pj|b^), and since p i— )• det (Ct ((/))) is a weak*-continuous function on 
this space, we conclude that there exists at least one minimum. ■ 

Proof of Theorem 3.2: First, only consider minimizing functions with / pdP = 7, which 
is always possible according to Lemma 2.1. Write 

Ep = EiTpiP), Cp (</,), rp(0)) and Ep^s = E{Tp{P), Cp (</.), rp(</>) + 5), 

as defined by (2.2) and (2.3), and suppose that P{(j) > Iep) > 0. Then there exists 5 > 
such that P{(f> > l^;^ J > 0. Since P{Ep) > 7 = / pdP, we have 

0< /" <j)dP< f (l-P)dP. 



Define < r < 1, such that 



(6.11) hi = P-Ie-^^^, h2=T{l-(t>)-lEp and j hidP = j 



h2 dP. 
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Note that ip = (j) — hi + h2 £ Kp{'y). Using the same argument as m (6.9) and the fact that 
(x - Tp{(t>)yCp{<P)-\x - Tp{^))h2ix) P{dx) < rp{^f J h2{x) P{dx), 
{x-Tp{<l)))'Cp{<l)r\x-Tp{(t>))hi{x)P{dx)>{rp{<l)) + 5f [ hi{x)P{dx), 



together with (6.11) and the fact that 5 > 0, this would mean 
1 



I^dP 



{x - Tpm'Cp{ct>r\x - Tp(0))(/i2(x) - hi{x)) P{dx) < 0. 



According to Lemma 6.3, this would imply dei{Cp{^)) < det{Cp{(p)), which contradicts the 
fact that (p minimizes det(Cp((/>)). This proves that (j) < Iep- 

On the other hand, suppose that P{(f) < 1e°) > 0. This means there exists a 6 < such 
that P{(j) < J > 0. Since P{Ep) < 7 and c/) <tEp, we know that 



{l-(p)dP< {l-^)dP< (pdP. 

Ep^s Je°p JdEp 

Define /i2 = (1 — </>) ■ ^Epg and note that by assumption J /i2 dP > 0. Then define < r < 1 
and /i2 such that 

hi = T ■ <j) ■ "iiQEp and J hi dP = J h2 dP. 

Again note that tp = (j) — hi + h2 G Kp{^), and by a similar argument as before, we would 
conclude that det(Cp(V')) < det{Cp{(p)), which is a contradiction. This shows 1^° < (f). 

Now, suppose that J (pdP > 7. Then, according to Lemma 2.1, for some < A < 1, the 
function < A0 < 1 would also be minimizing and satisfies J (A0) dP = 7. But then, the argu- 
ment above shows that X(j) = 1 on the interior of its own ellipsoid E{Tp{X(j)), Cp{X(f)),rp{\4>)), 
which is a contradiction. We conclude that we must have f (j) dP = 7. 

The last statement of the theorem is a little bit more subtle. Suppose P{dEp) > 0, since 
otherwise the statement is trivially true. Consider the following two functions on [0,1]: 

^ ' P{dEp) ■'^^ ' P{dEp) 

If one realizes that /i + /2 > 1, /i is non-decreasing and continuous from the right, whereas 
/2 is non- increasing and continuous from the left, it is not hard to see that either /i = 1 on 
[0, 1], in which case </> = 0, P-a.e. on dEp, or /2 = 1 on [0, 1], in which case (j) = 1, P-a.e. on 
dEp, or there exists t £ (0, 1) such that fi{t), f2{t) > 0. For this t G (0, 1), define 

A = dEp n {(1) < t} and B = dEp n {(j) > t}. 

Either P(AUB) = P{{x}) for some x E dEp, in which case P{dEp) = P({x}), or there exists 
X S supp(P|yi) and y G supp(P|b) with x ^ y. We will show that this last assumption will lead 
to a contradiction, thereby finishing the proof. Choose e > such that e < ||x — y||/3. Define 
Ae = Ar\Be{x) and B^ = Br\Be{y). By the choice of x and y we know that P{Ae),P{Be) > 0. 
Choose 1] < mm(tP{Be), (1 - t)P{Ae)) and define 

'''^^^ ^ pIa^^''^^'^ """"^ ^^^""^ ^ p(k)^'''^'^' 
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Since (f) < t on A;, and (p > t on B;,, we get that 

ip = (p + hi - h2 e Kp{'j). 
Furthermore, J ip dP = f 4)dP = 7. Since e < ||x — y||/3, we can see that 

Since Cp{'ip) is invertible, this means that (with strict inequahty due to Lemma 6.4) 

det(C7p(^)) = det (^^J\z- Tp^iz - Tpm'i,{z) P{dz] 

(6.12) < det j{z- Tp{^)){z - Tp{(p))'^{z) P{dz) 

= det {Cp{(p) + ^ /(^ - Tpim^ - Tp{(p))'{hi{z) - h2{z)) P{dz] 

Since AeUBe C dEp, and we know that for z G dEp, Tr {{z - Tp{(j))yCp{cl))-^{z - Tp{(t)))) 
is constant, we can use Lemma 6.3 to conclude that 

det(Cp(V')) < det (C7p ((/>)), 

which contradicts the minimizing property of (p. ■ 

6.2. Proofs of continuity (Section 4)- The proof of Theorem 4.1 uses the following two 
lemmas. 

Lemma 6.5 Suppose Pq satisfies (3.1). Let Pt — )• Pq weakly and suppose that (4.1) holds. For 

t >1, let ipt G Kt{'y) such that ipt < Ist? where Et = E{Tt{Tpt),Ct{tpt),'>'t{'>pt)), and suppose 
there exists R > 0, such that {tpt 7^ 0} C Bp, for t sufficiently large. Then 



ijt dPt - j iJt dPo ^ 0, TtiiPt) - ro(^t) ^ 0, and Ct{^Pt) - Co(^t) ^ 0. 
Proof: Because {V't 7^ 0} C Bp eventually, we can write 

/ iPt dPt - [^PtdPo= [ ^Pt d{Pt - Po), 

J J J Ba 

for t sufficiently large. For the signed measure Qt = Pt — Pq write Qt = Qf — Qi~ , where Qf 
and are positive measures on M'^. According to (4.1), 

sup Qt{E) + sup Qt{E) < 2 sup \Pt{E) - Po{E)\ ^ 0. 
E^e E££ E£e 

This implies sup^g^; Qf{E) — and sup^g^ Q^{E) — 0. Because < V't < 1^;^, we find 

0< / il,t{x)Qt{dx)<Qt{Etr\BR)<Qt{Et)^^, 

< / Mx) QTidx) < QtiEt n Bp) < Q^{Et) ^ 0, 
Jbr 
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which impUes that 



(6.13) 



Now, write 



i^t dPt - / dP, 



i^t{x)Qt{dx) 



Br, 



1 



(6.14) fiPtdPtJB, 
^J^hdPtLa 



ipt{x)x Pt{dx) 



f^tdP, 



JB 



R 



Mx)x{Pt- Po)idx) + 



f^tdPt fi^tdPoJjBn 



ipt{x)xPQ{dx). 



The first term in (6.14) tends to zero, because "f < J ipt dPt < 1 and 

< / Mx)\\x\\ Qtidx) < RQtiEt n Br) < RQt{Et) ^ 0, 
Jbr 

0< [ Mx)\\x\\Q;{dx)<RQt{EtnBR)<RQ^{Et)^0, 

J Br 



which imphes that 

Mx)x{Pt- Po){dx) 

Br 



'ilJt{x)x Qf (dx) 



ipt{x)xQf {dx) — )• 0. 



The second term in (6.14) also tends to zero, because of (6.13) and the fact that 



4'tix)x Po{dx) 



Bf 



< R. 



It follows that Tt{'ipt) — T(){i{jt) 0. Similarly, one proves Ct{ipt) — C'o(V't) — ?• 0. ■ 

Lemma 6.6 Suppose Pq satisfies (3.1). Let Pt — >• Pq weakly and suppose that (4.1) holds. 

For t > I, let ipt S -^^4(7) such that 11)1 < l^t, where Et = E{Tt{'ipt),Ct{'>pt),rt{'ipt)), and 
suppose there exist R > such that {ipt / 0} C Br, for t sufficiently large. Then there exist 
a subsequence tm. 00 and ip* G Kq ('j), such that 

hm {WtJ,Co{^PtJ) = {To{r),Coir)). 

Proof: Since < -(/^t ^ 1 and 7^ 0} C Br^ the V'i can be viewed as elements of the class 

£^ = G L°°(Po) : < V < 1, {V' / 0} C i^ij} , 

which is a weak*-compact subset of L'^(Pq\bj^. Hence, there exist a subsequence (V^t^) that 
has a weak* limit in £q^, say ■;/'*• This means that for any g G L^(Po)) 

lim / '^t,n9dPo= / ^*gdPQ. 



(6.15) 

lib — fLXJ / / 

In particular, J ipt^n dP — )• / tp* dPQ. Because J iptdPt > 7, together with Lemma 6.5 this 
implies 

/ r dPo = hm / dPo>j- hm / d{Pt^ - Pq) = 7, 
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SO that Tp* G Kq{'^). Finally, since the support of both ijjt^ and V'* lies in Br, it follows 
from (6.15) that 

^o(V^fm) = r / ^ / '^tm{x)x Po{dx) 



^ ' r{x)xPo{dx) = To{r), 



IrdPoJB, 

and similarly Co(^j„) ^ Co(^*). ■ 

Proof of Theorem 4.1: Consider the sequence (Tt{(pt), Ct{(j)t))- According to Proposition 3.1 
there exist Aq > 0, such that Amin(C't('^t)) > Aq for t sufficiently large. Similar to the beginning 
of the proof of Proposition 3.2 we obtain Amax(C'i('/'t)) < Ai (see (6.7)). Because ^pt G Kti^j), 
again according to Proposition 3.1, Xmm{Ct{ipt)) > Aq- Since det{Ct{(j)t)) < A^, for t sufficiently 
large, and det{Ct{ipt)) — det{Ct{cl)t)) tends to zero, it follows that det(Ct(V't)) < 2A^ eventually, 
so that according to Lemma 6.2, there exists a compact set which contains (Tt{tl^t),Ct{'4't)) 
for t sufficiently large. This means there exist a convergent subsequence. 

Now, consider a subsequence, which we continue to denote by (Tt{ipt),Ct{'>Pt)) , for which 
(Ttiipt), Ct{il^t)) — ^ (To, Co). Prom Lemmas 6.5 and 6.6, we conclude that there exists a further 
subsequence (tm), such that 

ro= hm T„(Vt„) = To(V*), 

m— >oo 

Co= lim a„(^,,J = Co(^*), 

m— >oo 

for some ijj* G Kq'{'^). It remains to show that (Tq, Co) is an MCD-functional, i.e., det(Co(V'*)) 
minimizes det(Co(^)) over Kq^j). To this end, suppose there exists 5 > and (j) G Kq{^), 
such that 

det(Co(</.)) < det(Co(^*)) - 5. 

Since the set of bounded continuous functions is dense within Kq ^'-j), we can construct a 
bounded continuous function ^ G Kq{'j), such that for all i,j = 1,2, ... ,k: 

I'll; - 4>\dPo, j \'4){x)xi- (l){x)xi\Po{x), and j \ilj{x)xiXj - (f){x)xiXj\ Pq{x), 

can be made arbitrarily small. Hence, we can construct a bounded continuous function ip G 
Kq-^j), such that 

det(Co(^)) < det(Co(^*)) - S/2. 
Now, since ip{x)x is bounded and continuous on Bji, we have 



TtAi^) = j i^{x)xPt^{dx) ^ j i;{x)xPoidx) = To(V), 

and similarly Ct^{ip) Co{ip). Since also det(Ct„ (V't„)) -det(Ct,„ (0t„)) 0, it would follow 
that 

lim det(Ct„(^)) = det(Co(^)) < det(Co(^*)) - ^ 

= lim det(Ct„(Vt„))-^= lim det(Ct„ (</.*„))- ^. 
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This would mean that for m sufficiently large, det(Ct„(V')) < det(Ct,„((/>(^)) — (5/4, which 
contradicts the minimizing property of (f)t^ . ■ 

Proof of Theorem 2.1: First note that when e | 0, then P^.s — ^ P weakly. Condition (4.1) 
automatically holds, and because P{H) < (7 — e)/(l — e) < 7, also condition (3.1) holds. 
According to Theorem 3.1 this means that MCD^(Pr,£) exists, for e > sufficiently small, 
and the minimizing (pr^e ^ Kp^^{'~f). Hence, together with Theorem 3.2, all conditions of 
Theorem 4.1 are satisfied, which yields the first limit in (i). The proof the second limit in (i) 
mimics the proof of Theorem 4.1. Note that although we are not dealing with a weakly 
convergent sequence of measures satisfying condition (4.1), we do have that for all continuous 
functions / with bounded support, 

(6.16) lim f fdPr,e = I fd{l-e)P 
and for every fixed R > 0, 

(6.17) lim sup \Pr,e{E) - (1 - £)PiE)\ = 0- 

||r||->oo Ee£,EcBR 

We first show the analogue of Proposition 3.2, i.e., there exists R > such that for all ||r|| 
sufficiently large, the support of all minimizing (j) for Pj-^s lies in Bfi. 

Choose R' > large enough, such that P{Bjii) > 7/(1 — e). This shows that there exist 
ip G (-P) with support contained in Bj^i, such that f ip dPr^e ^ 7) for all r G M.^, and 

from (6.16) we find 

hm (Tp,„(V),Cp,,.(V')) = {T^i-e)pW,C^i-e)Pm- 



When we take M = 2det(C(i_j)p (-(/;)), there exists ro > 0, such that for all r with ||r|| > ro, 
dei(Cp^ ^{ip)) < M. It follows, that if </) is a minimizing function for Pj.,e at level 7, we 
can conclude that det(Cp^ ^ ((/>)) < M, for ||r|| > rg. Also, since J (pdP > (7 — e)/(l — e). 
Proposition 3.1 yields that there exists Aq > 0, not depending on (j), such that for all a € M'^' 
with llall = 1 



^ {a'{x-Tp{cP)))^cPdP>Xo. 
This implies that 

J {a'{x - Tp^jm^<l>dPr,e ^ (1 - / («'(^ - Tp^J^))f<l>dP 

> (1 - e) / (a'(x - Tp{cl)))fct)dP > (1 - e)Xo 



This means that for all minimizing (j) for we have a uniform lower bound on the smallest 
eigenvalue. From here on, we copy the proof of Lemma 6.2. We choose R > and 6 > 
(independent of (j)) such that Pr,a{BR) > (1 - e)P{Bp) > 1 - 7 + 5 and J^^cpdPr^e > 5. 
This then shows that there exists Amax and L > such that for all r with ||r|| > rg and for 
all minimizing (p, we have ||7/3.^((^)|| < L and the largest eigenvalue of Cp^^^{(j)) is smaller 
than Amax- Now we can follow the proof of Proposition 3.2, starting from (6.7), to conclude 
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that there exists i? > and vq > 0, such that for all r with ||r|| > vq, the support of all 
minimizing (p for Pr,e lies within 

Note that, because according to Proposition 3.2 all minimizing <j) for (1 — e)P also have 
a fixed bounded support, this immediately yields statement (ii). Indeed, if Q has bounded 
support, then for ||r|| sufficiently large, / (j)dQ o = for all 4> with a fixed bounded 
support. Hence, for ||r|| sufficiently large, cj) is minimizing for P^.e at level 7 if and only if (j) is 
minimizing for (1 — e)P at level 7, which means that (j) is minimizing for P at level 7/(1 — e). 

To finish the proof of (i), we follow the proof of Theorem 4.1, from the point of considering 
a convergent subsequence of MCD^(Pr,£)- The conclusions of Lemmas 6.5 and 6.6 are still 
valid if we replace the condition of weak convergence by (6.16) and replace condition (4.1) 
by (6.17). This means that the proof of the second limit in (i) is completely similar to the 
remainder of the proof of Theorem 4.1, which proves (i). ■ 

Proof of Corollary 4.1: Since the MCD functional at Pq is unique, it follows immediately 
from Theorem 4.1 that each convergent subsequence has the same limit point (To((/)o), Co((/'o)), 
which proves part (i). 

For t = 1,2,..., write Et{s) = E{Tt{'^t),Ct{iJt),s) and pt = rt{ipt), as defined by (2.2) 
and (2.3), and write Eq{s) and po for the ellipsoid and radius corresponding to ^q. For any 
s > fixed, write 

Pt{Et{s)) = Pt{Et{s)) - Po{Et{s)) + Po{Et{.s)). 
Because Pq satisfies (4.1) and (Tt{ipt),Ct{ipt)) — ^ iToi4>o),Co{(f)o)), for any s > fixed, 

Po{Et{s)) = Po{Et{sy) ^ Po{Eo{sy) = Po{Eo{s)), 

and according to (4.1), Pt{Et{s)) — Po{Et{s)) — ;> 0. It follows that for any s > fixed, 

(6.18) Pt{Et{s)) ^ Po{Eo{s)). 

Now, let e > 0. Then by definition of po, it follows that Po(-^o(po — < 7 and by assump- 
tion (4.2) we also have Po{Eo{po + e)) > 7. From (6.18), we conclude that for t sufficiently 
large, 

Pt{Et{po - e)) < 7 < Pt{Et{po + e)). 

By definition of pt this means po — ^ ^ Pt ^ Po + £■ Since e > is arbitrary, this finishes the 
proof of part (ii). ■ 

Proof of Proposition 4.1 Let 4'n be a minimizing function for the MCD functional corre- 
sponding to Pfi- Then by definition 

det(C7„(<^„)) < det(C„(lsJ) = det(a„(S„)). 

First note that (pn cannot be zero on the boundary of En = E(Tn{4>n),Cn{4'n),rn{(pn))- Hence 
according to Theorem 3.2, we either have i;^^ = 1 on dEn or there exists a point x G dEn such 
that Pn{{x}) = Pn{dEn)- In the first case 

det(C„(S„)) < det(C„(5„)) = det(C7„(0„)), 

for the subsample Sn = {Xi : (f)n{Xi) = 1}, which means det(Cn(«S'„)) = det{Cn{4>n)) ■ 
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Consider the other case. Suppose (f)n = 1 in k points other than x and suppose there are 
m sample points Xi = x. Then we must have 7 > k/n and 4>n{x) = e„ for some < e„ < 1, 
where 



-f= I dPn = - + 



k mSn 



n n 



Now, let Sn be the subsample consisting of the k points where (j)n = ^ and [meri] points 
Xi = X. Then Sn has nPn{Sn) = k + [me„] = [^7] points. According to Proposition 3.2, 
with probability one, there exists -R > such that Sn and ^ 0} are contained in Bn. This 
implies 

1,^ / N ^ nR ( \men~\ raen\ R 

\\Tn{l^ ) - Tn{4>n)\\ = ^ ^ < 



[727] \ n n J [727] 

and similarly Cn{t§J - C„(0„) = 0(n~i)) and det(C„(lg^)) - det(C„((/)n)) = 0(n~^), with 
probability one. This means 

detidniSn)) < det{dn{Sn)) = det(C„(l^J) = det{Cn{(pn)) + 0{n~^)) 
with probability one. ■ 



The proof of Proposition 4.2 relies partly on the following property. 

Lemma 6.7 Let Sm be a subsample of size m > 2 and let X* E Sm have maximal Maha- 
lanobis distance with respect to the corresponding trimmed sample mean Tm = Tn{Sm) and 
trimmed sample covariance Cm = Cn{Sm), i-s., 

X* = argmax {X, - TmYC-^Xi - Tm). 

m 

Define subsample Sm-i = Sm \ {X*} with trimmed sample covariance Cm-i = CniSm-i)- 
Then 

det(an-i) < det(a„). 

Proof: We can write 

detiCm-i) = det f V {Xi - r„,_i)(X, - Tm-i)' ] 

y Xi£Sm-i I 

- ( Yl - ^-)(^^ - ^™)' I 

= det (^^Cm - -^{X* - Tm){X* - Tm)' 

\m — \ m — 1 



det {Cm+ ^ 



Cm — {X* — Tm){X* — Tm)' ^ 



m — 1 

From the definition of Cm, after multiplication with and taking traces, we find 

k = — {Xi — Tm)'C^{Xi — Tm)- 
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Therefore, since X* has the largest value for [Xi — TmyC^^{Xi — T^), it follows that 

[h - C-\X* - Trn){X* - TmY] = k - {X* - T„,)' C^,} {X* - Tm) < 0. 
The lemma now follows from Lemma 6.3. ■ 

Proof of Proposition 4.2: Suppose that there is a point X^ S En that is not in 5„. Because 
Sn must always have at least one point on the boundary of we can then interchange a 
point Xj E Sn that lies on the boundary of En with X(^. We will show that this will always 
decrease det(C„(5„)). Let 5* = (5„ \ {Xj}) U {X^}. Then 

fn{Sl) = ^ X,= fn{Sn) + -^{X, - X,) + 

Therefore (with a strict inequality), similar to (6.12), we have 

det(a„(5:)) < det (cn{Sn) - " Tn{Sn)){Xj - f„(5„))' 

+ -^{^l - Tn{Sn)){Xt - r„(5„))'j . 

Because Xj is on the boundary of En and Xi inside E^ we have 

{X, - fn{Sn)yCn{Snr'{Xe - f„(5„)) - {Xj - f„ (5„) )'a„ (5„) (X,- - fniSn)) < 0. 

Therefore, it follows from Lemma 6.3 that det(C„(5*)) < det(C,i(5.„)), which contradicts the 
minimizing property of Sn- We conclude that {Xi, . . . (1 En C Sn- Since according to 
Lemma 6.7 the subsample Sn has exactly [717] points, and by definition En contains at least 
[717] points, we conclude that {Xi, . . . , Xn} Ci En = Sn- ■ 

Lemma 6.8 Suppose Pq satisfies (3.1). With probability one, there exists R > and 
no > 1, such that for all n > uq and all subsamples Sn with at least wy points, there ex- 
ists a subsample S* with exactly [727] points contained in Bji such that 

dei{Cn{S*n)) <det{Cn{Sn))- 

Proof: The proof is along the lines of the proof of Proposition 3.2. We first choose i?' > 
and construct a subsample Sno C B^i with at least points, for which det(C,i(5„o)) is 
uniformly bounded for n sufficiently large. By the law of large numbers, Pn — ?• Pq weakly 
with probability one. Hence, we can choose R' > such that for all n > 1, 

P„(S^O >max{l- 7/2,7 + (l-7)/2}, 

with probability one, and define subsample 5„o = {Xi : Xi E -B_r'}. Then, 5„o C Bjii and 
because Pn{Bji') > 7, Because Isno ^ ^ni'y), according to Proposition 3.1, with probability 
one, there exist a Aq > such that 



Amin(Cn(l5„o)) ^ ^0, 
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for n sufficiently large. Define Dq = 2 det(Co(l_B^, )). From (4.1) we have Pn{Bjii) — )• Po{Bri), 
with probability one, and since the functions x and xx' bounded and continuous on Br' , we 
also have 

/ X dPn — 7- / X dPo , and / xx' dPn — >• / xx' dPo , 
with probability one. Hence, together with (4.3), it follows that for n sufficiently large, 
det(a„(5no)) = det(C„(l5„o)) = det(C„(lB^,)) < i^o, 

with probability one. Now, let Sn be a subsample with /i„ > 127 points. According to 
Lemma 6.7, without loss of generality, we may assume that is has exactly [71.7] points. When 
det(C'„(S'n)) > Dq, then we are done because the subsample 5^0 lias a smaller determinant, is 
contained in Bri , and according to Lemma 6.7 we can reduce Sno if necessary to have exactly 
[^7] points, without increasing the determinant. So suppose that Sn has [^7] points and 
det(C„(5'n)) < -Do- From here on the proof is identical to that of Proposition 3.2 and is left 
to the reader. ■ 

Proof of Theorem 4.2: With probability one — t- Pq weakly and (4.1) holds, since 
the class of ellipsoids has polynomial discrimination or forms a Vapnik-Cervonenkis class. 
According to (4.3) the MCD estimators can be written as MCD functionals with trimming 
function ipn = ls„- From Propositions 4.1 and 4.2 together with Lemma 6.8, it follows that ipn 
satisfies the conditions of Theorem 4.1 with probability one, which proves the theorem. ■ 

6.3. Proofs of asymptotic normality and IF (Section 5). The proof of Theorem 5.1 relies 
on the following result from [17], which we state for easy reference. 

Theorem 6.1 (Pollard, 1984) Let T be a permissible class of real valued functions with 
envelope H > (/> G and suppose that < E[i?(X)^] < 00. // the class of graphs of 
functions in T has polynomial discrimination, then for each 7] > and e > there exists 
6 > such that 



lim sup P < sup n 

n-s-oo I (0i,(/)2)6[(5] 



1/2 



h - d{Pn - Po) 



> r]} < e 



where [6] = {((/"i, 02) : (t>i,4'2 G and /(0i - (/>2)^ dPo < S"^}- 

The theorem is not stated as such in [17], but it is a combination of the Approximation 
Lemma (p. 27), Lemma n.36 (p. 34) and the Equicontinuity Lemma (p. 150). The polynomial 
discrimination of J- provides a suitable bound on the entropy of (Approximation Lemma 
together with Lemma 11.36). The stochastic equicontinuity stated in Theorem 6.1 is then a 
consequence of the fact that the entropy of J- is small enough (Equicontinuity Lemma). The 
classes of functions we will encounter in this way can always be indexed by the parameter set 
M.^ X PDS(/c) X IR_|_, and are easily seen to be permissible in the sense of Pollard [17]. 

Proof of Theorem 5.1: Consider equation (5.4) and define 

= {l{|lG-i(x-m)||<r} : m G M^ G G PDS(fc), p > 0}. 
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As subclass of the class of indicator functions of all ellipsoids, the class of graphs G of functions 
in J- has polynomial discrimination and obviously J- has envelope H = 1. Hence, Theorem 6.1 
applies to ^'3. For the real valued components of ^'i and ^'2; use that there exists -R > 0, 
such that for n = and n sufficiently large 

{x G M'^ : ||r-^(x - ^„)|| < p„} C Br. 

This means that for all i,j = 1,2, ... ,k, the classes 

= {xil{||G-i(x-m)||<r}nBfl : m G M^ G G PDS{k),p> 0}, 
Tij = {xiXjlqG-Hx-m)\\<r}nBR : m G M^', G G FBS{k),p> 0}, 

have uniformly bounded envelopes. According to Lemma 3 in [14], the corresponding classes 
of graphs have polynomial discrimination. Therefore, Theorem 6.1 also applies to the compo- 
nents of ^'i and ^2- It follows that 

= A(^„) + J ^{y, 9o) {Pn - Po){dy) + op{n-^/^). 

Now, A{6q) = and since ^(y, Oq) has bounded support, the term 

r 1 " 

behaves according to the central limit theorem and is therefore of the order Op(n~^/^). Because 
0n — ^ ^0 with probability one, according to Theorem 4.2, we find 

= A'{do)(9n - eo) + Op(n-i/2) + opiWdn - OoW). 

Because A'{6o) is non-singular, this gives \\9n — Oq\\ = Op(?i~^/^) and when inserting this, we 
conclude that 

1 " 

A'{Oo){en - ^0) = — V {^{Xi, Bo) - E^iXi, 60)) + op(n-i/2), 
n ^-^ 

which proves the first statement. For the second statement note that 

/ i^EM - My)) \\yf Pn{y) = o{n-'), 

with probability one. This follows from the characterization given in Theorem 3.2 and the fact 
that Pq satisfies (5.7). This means that the MCD functional On also satisfies equation (5.4). 
From here on the argument is the same as before, which proves the theorem. ■ 

Proof of Theorem 5.2: Consider expansion (5.9) and write Eq = £'(/io, Sq, /Oq)- Because, 
according to Theorem 4.1, iPe,x,Te,x, Pe,x) (/Uo,ro,Po), as e J, 0, for x ^ BEq we get 
4'e,xix) — )■ 1eo{x) and hence 

lim$£(2;) = ^(x,6lo), 

£4-0 



30 



ERIC A. CATOR AND HENDRIK R LOPUHAA 



with ^ defined in (5.3). Because 1_e° < </>o < Ieq' follows from (5.8) that A(^o) = 0. As A 
has a non-singular derivative at 6o, we find from (5.9), 

= (1 - e)A'(0o)(^.,x - ^o) + e^,{x) + o(||^,,, - 9o\\), 

from which we first deduce that 9s,x — (^o = 0{e), and then obtain the influence function: 

IF(x;e,Po) = lim^^^^^ = -A'(0o)"^^(^)- ■ 

£4-0 e 
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